|
Computer Vision research is driven by standard evaluation practices. The current systems are tested by their accuracy for tasks like object detection, segmentation and localisation. Methods like the convolutional neural networks seem to be doing pretty well in these tasks, but the current systems are still not close to solving the ultimate problem of understanding images the way humans do. So motivated by the ability of humans to understand an image and even tell a story about it, Geman ''et al.'' have introduced the Visual Turing Test for computer vision systems. As described in, it is “an operator-assisted device that produces a stochastic sequence of binary questions from a given test image”.〔 The query engine produces a sequence of questions that have unpredictable answers given the history of questions. The test is only about vision and does not require any natural language processing. The job of the human operator is to provide the correct answer to the question or reject it as ambiguous. The query generator produces questions such that they follow a “natural story line”, similar to what humans do when they look at a picture. == History == Research in Computer Vision dates back to the 1960s when Seymour Papert first attempted to solve the problem. This unsuccessful attempt was referred to as the (Summer Vision Project ). The reason why it was not successful was because computer vision is more complicated than what people think. The complexity is in alignment with the human visual system. Roughly 50% of the human brain is devoted in processing vision, which clearly indicates that it is a difficult problem. Later there were attempts to solve the problems with models inspired by the human brain. Perceptrons by Frank Rosenblatt, which is a form of the neural networks, was one of the first such approaches. These simple neural networks could not live up to their expectations and had certain limitations due to which they were not considered in future research. Later with the availability of the hardware and some processing power the research shifted to Image processing which involves pixel-level operations, like finding edges, de-noising images or applying filters to name a few. There was some great progress in this field but the problem of vision which was to make the machines understand the images was still not being addressed. During this time the neural networks also resurfaced as it was shown that the limitations of the perceptrons can be overcome by Multi-layer perceptrons. Also in the early 90’s convolutional neural networks were born which showed great results on digit recognition but did not scale up well on harder problems. Late 1990’s and early 2000’s saw the birth of modern computer vision. One of the reasons this happened was due to the availability of key, feature extraction and representation algorithms. Features along with the already present machine learning algorithms were used to detect, localise and segment objects in Images. While all these advancements were being made, the community felt the need to have standardised datasets and evaluation metrics so the performances can be compared. This led to the emergence of challenges like the Pascal VOC challenge and the ImageNet challenge. The availability of standard evaluation metrics and the open challenges gave directions to the research. Better algorithms were introduced for specific tasks like object detection and classification. Visual Turing Test aims to give a new direction to the computer vision research which would lead to the introduction of systems that will be one step closer to understanding images the way humans do. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Visual Turing Test」の詳細全文を読む スポンサード リンク
|